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Abstract. We study the problem of reconstructing a hidden graph given 
access to a distance oracle. We design randomized algorithms for the fol- 
^^ lowing problems: reconstruction of a degree bounded graph with query 

_ ■ complexity 0{n^'^); reconstruction of a degree bounded outerplanar 

^^i^ graph with query complexity 0{n); and near-optimal approximate re- 

p^ construction of a general graph. 

1 Introduction 






< 

^\ Decentralized networks (such as the Internet or sensor networks) raise algorith- 

mic problems different from static, centrally planned networks. A challenge is 
I— I the lack of accurate maps for the topology of these networks, due to their dy- 

t^ namical structure and to the lack of centralized control. How can we achieve 

^^ an accurate picture of the topology with minimal overhead? This problem has 

c/5 recently received attention (see e.g., |4I8I10I12] ). 

O For Internet networks, the topology can be investigated at the router and 

autonomous system (AS) level, where the set of routers (ASs) and their physical 
T-H connections (peering relations) are the vertices and edges of a graph, respectively. 

^ Traditionally, inference of routing topology has relied on tools such as traceroute 

and mtrace to generate path information. However, these tools require coopera- 
tion of intermediate nodes or routers to generate messages. Increasingly, routers 
Cq block traceroute requests due to privacy and security concerns, so inference of 

.• topology increasingly relies on delay information rather than on the route it- 

j<— ^ self. At this level of generality, many problems are provably intractable [2^, thus 

(^ suggesting the need to study related but simpler questions. In this paper, for 

T-H simplicity we assume that we have access to every vertex in the graph, and only 

►^ the edges are unknown. 

rN The problem. Consider the shortest path metric 6{-, •) of a connected, unweighted 

rrt graph G = {V,E), where \V\ = n. In our computational model, we are given 
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the vertex set V, and we have access to S via a query oracle Query(-, •) which, 
upon receiving a query {u,v) e V'^, returns d{u,v). The metric reconstruction 
problem is to find the metric S on V. The efficiency of a reconstruction algorithm 
is measured by its query complexity, i.e., the number of queries to the oracle. (We 
focus on query complexity, but our algorithms can also easily be implemented 
in polynomial time and space) . 

Note that finding S is equivalent to finding every edge in E, thus this problem 
is also called the graph reconstruction problem. 

Related work. Reyzin and Srivastava |18| showed an fiiji^) lower bound for the 
graph reconstruction problem on general graphs. We extend their result to get 
a lower bound for the graph approximate reconstruction problem. 

To reconstruct graphs of bounded degree, we apply some algorithmic ideas 
previously developed for compact routing [21] and ideas for Voronoi cells |15| . 

A closely related model in network discovery and verification provides queries 
which, upon receiving a node q, returns the distances from q to all other nodes 
in the graph |T2] , instead of the distance between a pair of nodes in our model. 
The problem of minimizing the number of queries is NP-hard and admits an 
0(log7i)-approximation algorithm (see [12i ). In another model, a query at a node 
q returns all edges on all shortest paths from q to any other node |3]. Network 
tomography also proposes statistical models [6"20]. 

Our results. In Section [2] we consider the reconstruction problem on graphs 
of bounded degree. We provide a randomized algorithm to reconstruct such a 
graph with query complexity 0{v?''^). Our algorithm selects a set of nodes (called 
centers) of expected size 0{y/n), so that they separate the graph into 0{^\/n) 
slightly overlapped subgraphs, each of size 0{^/n). We show that the graph 
reconstruction problem is reduced to reconstructing every subgraph, which can 
be done in 0{n) queries by exhaustive search inside this subgraph. 

In Section [3] we consider outerplanar graphs of bounded degree. An outer- 
planar graph is a graph which can be embedded in the plane with all vertices on 
the exterior face. Chartrand and Harary |7] first introduced outerplanar graphs 
and proved that a graph is outerplanar if and only if it contains no subgraph 
homeomorphic from 1^4 or ^2.3. Outerplanar graphs have received much atten- 
tion in the literature because of their simplicity and numerous applications. In 
this paper, we show how to reconstruct degree bounded outerplanar graphs with 
expected query complexity 0{n). The idea is to find the node x which appears 
most often among all shortest paths (between every pair of nodes), and then 
partition the graph into components with respect x. We will show that such 
partition is [3 -balanced for some constant (3 < 1, i.e., each resulting component 
is at most (3 fraction of the graph. Such partitioning allows us to reconstruct the 
graph recursively with O(logn) levels of recursion. However, it takes too many 
queries to compute all shortest paths in order to get x. Instead, we consider an 
approximate version of x by computing a sampling of shortest paths to get the 
node which is most often visited among all sampling shortest paths. We will show 
that the node obtained in this way is able to provide a /3-balanced partition with 



high probability. Our algorithm for outerplanar graphs gives an 0{A ■ nlog n) 
bound which, for a tree (a special case of an outerplanar graph), is only slightly 
worse than the optimal algorithm for trees with query complexity 0{A ■ nlogn) 
(see [II]). On the other hand, the tree model typically restricts queries to pairs 
of tree leaves, but we allow queries of any pair of vertices, not just leaves. 

In Section [4] we consider an approximate version of the metric reconstruction 
problem for general graphs. The metric 5 is an /-approximation of the metric 6 
if for every pair of nodes (u, v), 5{u, v) < S{u, v) < f ■ d{u, v), where / is any sub- 
linear function of n. We give a simple algorithm to compute an /-approximation 
of the metric with expected query complexity 0(n^(log7i)//). We show that our 
algorithm is near-optimal by providing an i7(n^//) query lower bound. 

An open question is whether the 0{n^'^) bound in Theorem 111 is tight. 

Other models. The problem of reconstructing an unknown graph by queries 
that reveal partial information has been studied extensively in many different 
contexts, independently stemming from a number of applications. 

In evolutionary biology, the goal is to reconstruct evolutionary trees, thus 
the hidden graph has a tree structure. One may query a pair of species and 
get in return the distance between them in the (unknown) tree |22]. See for 
example |14I16I19] . In this paper, we assume that our graph is not necessarily a 
tree, but may have an arbitrary connected topology. 

Another graph reconstruction problem is motivated by DNA shotgun se- 
quencing and linkage discovery problem of artificial intelligence (F]. In this model 
we have access to an oracle which receives a subset of vertices and returns the 
number of edges whose endpoints are both in this subset. This model has been 
much studied (e.g., |3I9I13I1S] ) and an optimal algorithm has been found in |17| . 
Our model is different since there is no counting. 

Geometric reconstruction deals with, for example, reconstructing a curve 
from a sampling of points |llll| or reconstructing a road network from a given 
collection of path traces [8]. In contrast, our problem contains no geometry, so 
results are incomparable. 



2 Degree Bounded Graphs 

Theorem 1. Assume that the graph G has hounded degree A. Then we have 
a randomized algorithm for the metric reconstruction problem, with query com- 
plexity 0{A'^ ■ Tt'l'^ ■ log n ■ log log n), which is Oirfi''^) when A is constant. 

Our reconstruction proceeds in two phases. 

In the first phase, we follow the notation from Thorup and Zwick [3T]: Let A C 
y be a subset of vertices called centers. For w G T^, let (5(^, v) = min{5(ii, v) \ 
u £ A} denote the distance from v to the closest node in A. For every w G V, let 
the cluster of w with respect to the set A be defined by C^ — {v £ V \ S{w, v) < 
S{A, v)}. Thus for w ^ A, C^ is the set of the vertices whose closest neighbor in 
AU {w} is w. Algorithm Modified-Center(F, s), which is randomized, takes 



as input the vertex set V and a parameter s e [1, jt-], and returns a subset A (Z V 
of vertices such that all clusters C^ (for all w € V) are of size at most 6n/s. 
A has expected size at most 2s log n, thus the expected number of queries is 
0{s ■ n ■ log^ n • log log n). This algorithm applies, in a different context, ideas 
from [5T], except that we use sampling to compute an estimate of \C^\. 

In the second phase. Algorithm LoCAL-RECONSTRUCTiON(y, A) takes as 
input the vertex set V and the set A computed by MODiFiED-CENTER(y, s), 
and returns the edge set of G. It partitions the graph into slightly overlapped 
components according to the centers in A, and proceeds by exhaustive search 
within each component. Inspired by the Voronoi diagram partitioning in |15j . 
we show that these components together cover every edge of the graph. The 
expected query complexity in this phase is 0(slogn(n + Z\'*(n/s)^)). 

Letting s = -^/n, the expected total number of queries in the two phases is 
0{A'^ ■ n^/^ ■ log^ n ■ log log n). 

We use the notation QUERY(yl, v) to mean QuERY(a, v) for every a G A, and 
the notation Query(A, B) to mean QuERY(a, b) for every a £ A and b <E B. 



Modified-Center(F, s) 


1 


A ^ 0, W ^V 


2 


T ^ K ■ logn ■ \oglogn (i^T = 0(1) to be defined later) 


3 


while W ^9 


4 


do A' 4- Random subset of W s.t. every node has prob. s/ VF 


5 


Qvery{A',V) 


6 


A^AUA' 


7 


for w &W 


8 


do X 4— Random multi-subset of V with s ■ T elements 


9 


Query{X,w) 


10 


Let C^^<~\XnC^\-n/\X\ 


11 


W ^{weW ■.C^> 5n/s} 


12 


return A 



Local-Reconstruction(V^, A) 

1 E^d) 

2 for a e A 

3 do Ba^{veV \S{v, a) < 2} 

4 QVERY{Ba,V) 

5 Da^Ba 

6 for 6 e Ba 

7 do Da^ Da U{veV \S{b,v) <S{A,v)} 

8 QUERY(D^,i:'<,) 

9 E^EU{{di,d2)eDaXDa:S{di,d2)^l} 

10 return E 

Figure [l] gives an illustration of Algorithm LoCAL-RECONSTRUCTiON(y, A). 
Vertices ai , . . . , as are centers in A and define subsets Dai i ■ • ■ ; ^a^ which overlap 



slightly. We will show in Lemma [s] that the subsets Da (for all a e A) together 
cover every edge in E. Thus the local reconstruction over every Da (for a e A) 
is sufficient to reconstruct the graph. 




Fig. 1. Partition by centers 



Theorem [T] follows from Lemma [2] and [Sl 



Lemma 2. With probability at least l/(4e), the Modified-Center(F, s) algo- 
rithm takes 0{s ■ n ■ log n ■ log log n) queries and returns a set A of size at most 
4s log n such that \C^\ < Qn/s for every w £V . 



Remark. The difference between our algorithm and algorithm Center/G, s) 
in HJ^ is that, CENTERfG, sj eliminates w ^W when \C^\ < Anjs, by calculat- 
ing \C^\ exactly, which needs n queries in our model; while our algorithm gives 
an estimation of\C:^\ using 0(s -log n- log log n) queries, so that with high prob- 
ability, it eliminates w (£ W when C^ < An/ s and it does not eliminate w ^W 
when C^ > 6n/s. 

Proof Fix A and w and let Yyj = \X n C^\ = \{x & X \ Six, w) < 6{x, A)}\. The 
expected value of Y^ is |G^| • l-'f |/n. Since X is random, by standard ChernofF 
bounds there is a constant K such that, for any node w, 



'[Y^, >5T] > l-l/(4nlogn) 
'[Y^ <5T] > l-l/(4nlogn) 



if C^, > 6n/s (and thus E[Y^ 
if C^ < 4n/s (and thus E[Yyj 



>6T) 
<4T). 



Let C^ = Y^ ■ nl\X\, where \X\ = s ■ T. When the number of nodes w in 
estimation is at most 4nlogn, with probability at least (1 — l/(4nlogn))''"'°s" ~ 



1/e, we have: 

/^>5n/s, ifC^>6n/s • ,• .• (U 

< ■—- . , tor every w m estimation. (1) 

[C^ <5n/s, if C^ <4n/s 

We assume that n is large enough that this probability is at least l/(2e). 

Using the same proof as that of Theorem 3.1 in ^21], we can prove that under 
condition (111, algorithm MoDlFlED-CENTER(y, s) executes an expected number 
of at most 2 log n iterations of the while loop and returns a set A of expected size 
at most 2s log n such that \C^\ < 6n/s for every w G V. Thus with probability 
at least 1/2, the algorithm executes at most 41ogn iterations of the while loop 
and the set A is of size at most 4s log n. The number of queries is 0{s-n- log^ n ■ 
log log n) in this case, since every iteration takes 0{s -n- log n ■ log log n) queries. 
So the lemma follows. D 

Lemma 3. Under the conditions that \A\ < 4s log n and \C^\ < 6n/s for every 
w & V, Algorithm L0CAL-ReC0NSTRUCTI0n(F, A) finds all edges in the graph 
using 0(s log n(n + Z\'*(n/s)^)) queries. 

Proof. Let Da = ^aUbes ^b'- We will prove that for every edge (u,u) in E, 
there is some a (z A, such that u and v are both in D^,. Thus the algorithm is 
correct: it finds all edges in E. 

Consider {u,v) G E. Without loss of generality, we assume S{A,u) < S{A,v). 
Let a G Ahe such that 5{a,u) = 6{A,u). We will show that u and v are both 
in Da. When S{a,u) < 1, u and v are both in Ba C Da. So we consider only 
S{a, u) > 2. Take b to be the node, in any of the shortest paths from a to u, 
such that 6{a, b) — 2. Then 6{b, u) — 5{a, m) — 2 and S{b, v) < 6{b, u) + 5{u, v) = 
S{a,u) — 1 by the triangle inequality. Using S{a,u) — 6{A,u) < S{A,v), we have 
6{b,u) < S{A,u) and S{b,v) < S{A,v). So u and v are both in C^, which is a 
subset of Da since b G Ba. 

Because every Da (for a E A) has size at most A^ ■ 6n/s, the total query 
complexity is 0(s log n(n + Z\'*(n/s)'^)). D 



3 Degree Bounded Outerplanar Graphs 

In this section, we consider the connected graph G = {V, E) to be outerplanar [7] 
and of bounded degree A. We show how to reconstruct such a graph with ex- 
pected query complexity 0{n). Generally speaking, we partition the graph into 
balanced-sized subgraphs and recursively reconstruct these subgraphs. 



3.1 Self-contained Subsets, Polygons and Partitions 

Before giving details of the algorithm, we first need some new notions. 

Definition 4. The subset U '^V is said to be self-contained, if for every {x, y) € 
U X U , any shortest path in G between x and y contains nodes only in U . 
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For every subset U C V, note G[U] to be the subgraph induced by U, i.e., 
G[U] has exactly the edges over U in the graph. It is easy to see that for every self- 
contained subset U, G[U] is outerplanar and connected; and that the intersection 
of several self-contained subsets is again self-contained. 

Definition 5. We say that the k-tuple (xi, . . . , Xk) G V^ (where k > 3) forms 
a polygon if G[{xi, . . . , x^}] has exactly k edges: (xi, X2), {x2, xy,), . . . , (a;^, xi). 

Definition 6. Let U he a self-contained subset ofV and letUi, . . . ,Un he subsets 
of U . We say that {Ui, . . . , [/,,} is a partition of U if every Ui is self-contained, 
and for every edge {x,y) in G\U], there exists some Ui (1 < i < fj) such that x 
and y are both in Ui. Let /3 < 1 be some constant. The partition {[/i, . . . , C/^} of 
U is said to be /3-balanced if every Ui is of size at most f3\U\. 

Given any partition of U, the reconstruction problem over U can be reduced 
to the independent reconstruction over every Ui (1 <i < rj). 

Let [/ be a self-contained subset of V. For every vertex v ^ U, its removal 
would separate U into Uy (n^ > 1) connected components. For every i e [l,n„], 
let S* i be the set of nodes in the i"^ component and let Sy^i = S'* ^ U {v}. We 
say that {Sy^i, . . . , Sy^n^} is the partition of U by the node v. 

3.2 Balanced-Partition Algorithm 

Let us now introduce the main algorithm Balanced-Partition([/), which 
takes as input a self-contained subset U C V with \U\ > 10 and returns a 
^-balanced partition of U, for some constant /3 £ (0.7, 1). The algorithm takes a 
sampling of 2a; nodes (oi, . . . , a^j, 6i, . . . , 6^^), where u = C ■ log \U\ for some con- 
stant C > 1, and tries to find a /3-balanced partition of U under this sampling. It 
stops if it finds such a partition, and repeatedly tries another sampling otherwise. 
Below is the general framework of our algorithm. The details of the algorithmic 
implementation are given in Appendix IX] where we give the constants C and /3. 

1. Take a sampling of 2uj nodes (ai, . . . ,0;^, 6i, . . . , &ij)- For every i e [1,^], 
compute a shortest path between a^ and hi. Let x be some node with the 
most occurrences in the uj paths above. 

2. Partition U into Sx.i, ■ ■ ■ , Sx.n^ by the node x. If all these sets have size at 
most /3\U\, return {Sx,i, ■ ■ . , Sx,n^}', otherwise let D = Sx.k be the largest 
set among them and let Vq — U\Sl f,. 

3. In the set D, compute the neighbors of x in order: yi, . . . ,yx, where A < Z\. 
If A = 1, go to Step 1. 

4. For every i € [1, A], partition U into 5*^.^1, . . . , Sy-^n by y^. Let Sy-^ki be the 
subset containing x and let Vi = U\S*^,^^ (see Figure [2|. If \Vi\ > /3\U\, go 
to Step 1. 

5. Let T = DnSyj^^ki<^- ' '^'^yx.kx- Separate T into subsets Ti , . . . ,Tx^i as in Fig- 
ure [2J If every Ti has at most f3\U\ nodes, return {Ti, . . . , Tx-i, Vq, . ■ . , V\}. 

6. Let Tj be the set with more than /3|C/| nodes. Find the unique polygon 
(gi, . . . , qi) in Tj that goes by nodes x, yj and yj+i. 
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7. For every i e [1, 1], partition U into 5*^^,1, . . . , Sq-,n^. by qi. Let Sq.^mi be the 



subset containing the polygon above and let Wi 
li\W,\ > P\U\, go to Step 1. 



U\S*^^rn, (see Figure [3]|. 



Let R = 



'-'qi.mi I I 



ns„ 



Separate R into subsets Ri, . . . ,Ri as in 



Figure [SJ If some Ri has more than (3\U\ nodes, go to Step 1; else return 
{Ri,...,Ri,Wi,...,Wi}. 
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Fig. 2. Partition by neighbors 



Fig. 3. Partition by polygon 



In Appendix [X] we give formal definitions and algorithms for subproblems; 
shortest path between two nodes; partition C/ by a given node; obtain the neigh- 
bors of X in order; partitions \J with respect to an edge; and find the unique 
polygon that goes by nodes x, yj and j/j+i- Finally, we give an improved im- 
plementation of partitioning [/ by a polygon (Steps 7-8). All these algorithms 
use 0{A ■ \\J\ log^ |[/|) queries. It is easy to see that the algorithm Balanced- 
Partition(C/) always stops with a /3-balanced partition of U . 



3.3 Prom Balanced Partitioning to Graph Reconstruction 

Let us show how to reconstruct the graph using Balanced-Partition([/) as- 
suming the following proposition, which will be proved in Section [3. 4[ 

Proposition 7. For any self-contained subset U C V with \U\ > 10, the ran- 
domized algorithm BALANCED-PAR,TITION(t/) returns a [3-balanced partition of 
U with query complexity 0{A ■ \U\ log \U\). 

Based on the algorithm BALANCED-PARTiTiON(f/), we reconstruct the graph 
recursively: we partition the vertex set V into self-contained subsets Vi, . . . ,Vk 
such that every Vi has size < f3n; for every Vi, if \Vi\ < 10, we reconstruct G[Vi] 
using at most 9^ queries; otherwise we partition Vi into self-contained subsets of 



size at most l3\Vi\ < P'^n, and continue with these subsets, etc. Thus the number 
of levels L of the recursion is O(logn). 

Every time BALANCED-PARTiTiON(f/) returns a partition {Ui, . . . , C/fc}, we 
always have \Ui\ + ••• + \Uk\ < \U\ + 2{k - 1). For every 1 < i < L, let 
Ui,i, . . . , Ui^Mi be all sets on the i^^ level of the recursion. We then have |C/i.i | + 
• ■ ■ + |C/i,_A/; I < 3n. Thus the total query complexity on every level is 0{A-n\og n) 
by Proposition [7] So we have the following theorem. 

Theorem 8. Assume that the outerplanar graph G has hounded degree A. We 
have a randomized algorithm for the metric reconstruction problem with query 
complexity 0{A ■ nlog n), which is 0{n) when A is constant. 



3.4 Complexity Analysis of the Balanced-Partition Algorithm 

Now let us prove Proposition [7J Since the query complexity to try every sampling 
is 0{A ■ |J7|log^ |f/|), we only need to prove, as in the following proposition, 
that for every sampling, the algorithm finds a /3-balanced partition with high 
probability. This guarantees that the average number of samplings is a constant, 
which gives the 0{A ■ \U\ log \U\) query complexity in Proposition M 

Proposition 9. In the algorithm BALANCED-PARTITION (C/), every sampling of 
(oi, . . . , a^j, 6i, . . . , b^^) gives a ^-balanced partition with probability at least 2/3. 

To prove Proposition |9] we need Lemmas [TO] [TT] and [12] whose proofs are in 
Appendix [Bj 

Lemma 10. Let (ai, . . . , a^^, 6i, . . . , 6^^) be any sampling during the algorithm 
Balanced-Partition(C/). Let x be the node computed from this sampling in 
Step 1. We say that a set S is a /3-bad set, if it is a self-contained subset of U 
such that X ^ S and \S\ > P\U\ for some constant /3. Then x does not lead to a 
^-balanced partition of U only when there exists some P-bad set. 

For any node u d U , define p„ to be be the probability that u is in at least 
one of the shortest paths between two nodes a and b, where a and b are chosen 
uniformly and independently at random from U. 

Lemma 11. There exists some constant a G (0,1), s.t. in every outerplanar 
graph of bounded degree, there is a node z with Pz > a. 

Lemma 12. Let uj = C-log \U\ (for some constant C to be chosen in the proof). 
Take a sample of 2u) nodes uniformly and independently at random from U . Let 
them be Oi, . . . , a^^, 6i, . . . , 6^^. For every v € U , let py be the percentage of pairs 
{o-i,bi)i<i<i^ such that v is in some shortest path between Oi and bi. Let x be 
some node in U with the largest p^. Then with probability at least 2/3, we have 



Px > a/2, where a > is the constant in Lemma 11 



Now we will prove Proposition [91 By Lemma 10 we only need to bound the 



probability of existence of /3-bad set. Let C be the constant chosen in Lemma [T2| 
Let X be the node computed from the sampling (ai, . . . , O;^, 61, . . . , 6^) in Step 
1 of Algorithm Balanced-Partition([/). Take /3 = ^1 — a/2, where the con- 
stant a e (0,1) is provided by Lemma [Tl] Then /3 e (0.7,1). Suppose there 
exists a /3-bad set S. For every (a, b) G S x S, any shortest path between a and b 
cannot go by x, since S is self-contained. So p^ < 1 — {\S\/\U\)'^ < 1 — /3^ = a/2. 



By Lemma 12 the probability that p^ < a/2 is at most 1/3. So the probability 



of existence of /3-bad set is at most 1/3. Thus we complete the proof. 

4 Approximate Reconstruction on General Graphs 

In this section, we study the approximate version of the metric reconstruction 
problem. We first give an algorithm for the approximate reconstruction, and 
then show that this algorithm is near-optimal by providing a query lower bound 
which coincides with its query complexity up to a logarithmic factor. 

Definition 13. Let f be any sublinear function of n. An /-approximation 5 of 

the metric S is such that, for every {u,v) G V"^ , S{u,v) < S{u,v) < f ■ S{u,v). 

The following algorithm Approx-Reconstruction(F) receives the vertex 
set V and samples an expected number of 0(n(logn)//) nodes. For every sam- 
pled node u, it makes all queries related to u and provides an estimate 5(v, w) 
for every v within distance //2 from u and every w G y\{u}. 

ApPROX-RECONSTRUCTION(y) 

1 while S is not defined on every pair of nodes 

2 do u -s— a node chosen from V uniformly at random 

3 for every v G V 

4 do Query(u, u) and let diu,v) <— 6{u,v). 

5 Su^{v:S{u,v)<f/2} 

6 for w e Su \ {u} 

7 do for w e 5'„\{u} 

8 do S{v,w) ^ 1 

9 for w ^Su 

10 do 6{v, w) -i— 5{u, w) — S{u, v) 

11 return 6 



Theorem 14. The randomized algorithm Approx-Reconstruction(V^) com- 
putes an f -approximation S of the metric S using 0(n^(logn)//) queries. 

Proof. First we prove that for every (v, w), we have d{v, w) < 6{v, w) < f-S{v, w). 
There are two cases: 

Case 1: w G Su\{v} (line 7). Then S{v, w) = 1 < 6{v,w) < S{u,v)+S{u, w) < 
(//2) + (//2) — f — f ■ S{v, w), because v and w are in Su- 
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Case 2: w ^ Su (line 9). On the one hand, by the triangular inequality, 
S{v,w) > S{w,u) — d{v,u) = S{v,w). On the other hand, by the triangular 
inequahty, (5(w,w) < {S{u,w) — 5{u,v)) + 2S{u,v). The first tern is S{v,w). The 
second term, by definition of Su, is at most / — 1. Since v € Su and w ^ Su, 
we have S{u,w) — S{v,w) > 1, so the second term can be bounded by / — 1 < 
(/— l)-(5(i',w). Adding completes the proof of the upper bound. 

Next, we analyze the query complexity of the algorithm. Since G is connected, 
for every node v there are at least //2 points u such that v G S^- Let X denote all 
samples during the algorithm. The number of queries is n|X|, and its expectation 
is nJ2t P''[l^l > ^]- Let Xt denote the first t samples chosen. We have: 

Ti rn^i ,1 Ti r-i u i^ j ri i ^ f 1 if t < 2n(\nn) / f 

Pr[\X\ > t] = Pr[3vyu e X„v i 5„] < | ^^ p^^^^ ^ ^^^ ^ ^ ^^^ ^^^^^ J^^ 

By independence, Pr[Vu € Xt,v i Su] < (1 - (//2)/n)* < e"*^/(2«)^ rp^^^g 

£; #queries < n — + n^ )' \ ^ 0{n^ {log n)f). D 

On the lower bound side, Reyzin and Srivastava proved a tight f2{n'^) bound 
for the exact reconstruction problem, as in the following proposition. 

Proposition 15. [ISJ Any deterministic or randomized algorithm for the exact 
graph reconstruction problem requires f2{n?) queries. 

We extend the proof of Proposition [15] to get a lower bound for approximate 
reconstruction as in Theorem [16] whose proof is in Appendix jC] 

Theorem 16. Any deterministic or randomized approximation algorithm re- 
quires n{n'^ / f) queries to compute an f -approximation of the graph metric. 
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A Implementation of Balanced-Partition Algorithm 

Definition 17. Define a boundary cycle {bi, . . . ,bt) to be a cycle of vertices 
along the whole boundary of the unbounded face, where bi andbi^i are connected 
by an edge for every i G [l,i] with bt+i — 61. 

Notice that a boundary cycle may contain several occurrences of the same vertex. 
For example, a boundary cycle of the graph in Figure fflmay be (a, c, d, e, f, c, 6). 
Let riy be the number of connected components in G[U] after the removal 
of the vertex v. It is easy to see that Uy is the number of occurrences of v in 
any boundary cycle of G[U]. The node v is said to be a cut vertex in G[U] when 
n,, > 2. 




Fig. 4. Example of an outerplanar graph 



A.l Find a Shortest Path 

Let C/ be a self-contained subset of V. Let a and b be two nodes in U. The 
following algorithm Shortest- PATH(a, b, U) returns some shortest path between 
a and b using 0{\U\ log \U\) queries. 

SHORTEST-PATH(a, 6, U) 

1 if(5(a,6) = l 

2 then return path (a, b) 

3 QuERY(t/, a) 

4 Query(1/, b) 

5 T ^{u&U\6{u,a)+5{u,b) = 5{a,b)} 

6 c ^— any node in T such that 5{c, a) = [(S(a, 6)/2j 

7 C/i ^{ueT| 5(w,a) <(5(c,a)} 

8 U2^{ueT\5{u,a)>5\c,a)} 

9 Pi ^ Shortest- PATH(a, c, Ui) 

10 P2 ^ Shortest- Path(c, 6, [/2) 

11 return the concatenation of Pi and P2 

First, we make 2|[/| queries to get (5(w, a) and 5{u, b) for every u E U. The set 
T is the set of nodes in at least one shortest path between a and b. Let c be the 
node in the middle of some shortest path between a and b. Then we construct 
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recursively a shortest path between a and c and a shortest path between c and 
b. The concatenation of these two paths is a shortest path between a and b. 

During the recursion, the distance between the two given endpoints is reduced 
to half at every level, so there are at most 0(log|t/|) levels of recursion. The 
number of queries at every level 0(|L'^|), since the sets in the same level of 
recursion are disjoint. So the total query complexity is 0(|J7| log \U\). 

A. 2 Partition by Node 

Let C/ be a self-contained subset of V and let a; be a node in U. The set U 
is separated into n^ subsets S^^i, . . . , S^^n^ after the removal of x. Recall that 
{S* I, . . . ,S* „ } is the partition of U by x, where S* ^ ~ S^^i U {x}. 

The following algorithm Partition-by-node(x, U) computes this partition 
using 0{A- \U\) queries. In the algorithm, Y is the set of neighbors of x. For any 
2/1 and y2 in Y , we say that they are consecutive neighbors if there exists a path 
between yi and 2/2 that does not go by nodes in Y\{yi^ 2/2}- Two neighbors y and z 
are in the same Sx^i (for some i G [1 , n^.] ) if and only if there exists Wi , . . . , w^ such 
that Vi = y,Vk = z, and every (w^, Wi+i)i<i<fc is a pair of consecutive neighbors. 
The algorithm maintains a Disjoint-set data structure with Union operation on 
consecutive neighbors. This leads to the partition {Yi, • • -Yn^} of Y, which is 
the same as {S* i n F, . . . , S** „^ n Y}. We then classify every u G U\{x} into 
some S* j, according to which Yi contains the nodes nearest to u. 

Partition-by-node(x, U) 

1 QUERY(f7, x) 

2 Y ^ {ueU\S{x,u) ^1} 

3 Query(L/, Y) 

4 for u eU 

5 do du ^ mmy(=Y{Siu,y)} 

6 Au^{yeY\d{u,y) = du} 

7 if|AJ = 2 

8 then (ai, a2) <— the only two elements in A^ 

9 Union{ai,a2) 

10 if|A„| = l 

11 then a ^ the only element in A^ 

12 for y G F 

13 do if 5{u, y) = du + I 

14 then Union{a,y) 

15 {Yi, . . . , Yn^} <— partition of Y by the Disjoint-set data structure 

16 for i <— 1 to Tia; 

17 do S;,^{ue U\{x} I Au C rj 

18 return {SI, U {x}, . . . , 5*_„^ U {x}} 

We will show that the partition { Yi , . . . , F„^ } in line 15 is indeed the partition 
{S*i n y, . . . , 5* „^ n Y}. Consider any consecutive neighbors 2/1 and 2/2- Let 
fi, . . . ,w/ be a shortest path between 2/1 and 2/2 and let u = V[;/2]- When / is 
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odd, S{u,yi) ~ S{u, 1/2) = d^, so {yi,y2) is added to the Disjoint-set in line 9; 
when / is even, 5{u,yi) = du and S{u,y2) = d„ + 1, so (2/1,2/2) is added to the 
Disjoint-set in hne 14. Thus every pair of consecutive neighbors is added to the 
Disjoint-set. For any y € Y and z € Y such that y and z are in the same S* j, 
there exists a path vi, . . . ,Vk with vi = y and Vk = z, such that every (vi, Wi+i) 
is a pair of consecutive neighbors and thus added to the Disjoint-set. So y and 
z are in the same partition. On the other hand, it is easy to see that any node 
y and node z in Union(y, z) are in the same S* j for some i G [l,na;]. So the 
partition of Y in the algorithm is indeed {5* ^ n 1", . . . , 5** „^ n F}. 

From this partition, we can construct S* j easily (lines 16-17) by notifying 
that every u ^ S* ^ satisfies that d{u,Yi) < 5{u,Yj) for every j ^ i- 

A. 3 Ordering of Neighbors 

Let U he a self-contained subset of V and let a; be a non-cut vertex in G[U]. 
Let Y be the set of neighbors of x in U. Since U is outerplanar, there exists an 
order yi, . . . ,y\ of the elements in Y such that every (y^, yi+i)i<i<A is a pair 
of consecutive neighbors. Such order is unique up to symmetry, i.e., the only 
other order being {y\, ...,2/1). In fact, for every yi and yj under the above order 
with \i — j\ > 1, yi and yj are not consecutive neighbors, since otherwise G[U] 
contains a subgraph homeomorphic from K4, which contradicts the fact that 
G[U] is outerplanar. 

The algorithm NElGHBORS-lN-ORDER(a;, C/) makes 0{A ■ \U\) queries and 
returns the A neighbors (j/i, . . . ,y\) in order. 

NEIGHBORS-lN-ORDER(a;, U) 

1 QUERY(t/, x) 

2 Y ^{ueU\d{x,u) = l} 

3 Query(L/, Y) 

4 i? = 

5 for u E U 

6 do du ^ mmy^Y{Siu,y)} 

7 Au<-{yeY\d{u,y) = du} 

8 if|A„| = 2 

9 then (ai, a2) <— the only two elements in Au 

10 i?-^i?U {(01,02)} 

11 if|v4„| = l 

12 then a ^~ the only element in Au 

13 for y eY 

14 do if d{u, y) = du + I 

15 then R^ RU{{a,y)} 

16 (j/i, . . . , j/a) ^— an ordering of F s.t. for every 1 < i < A, {yi, yi+i) G R 

17 return {yi,...,yx) 



The correctness of the algorithm follows from Proposition 18 



Proposition 18. A pair (u, v) is in R iff. u and v are consecutive neighbors. 
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Proof. The same argument as in the last section shows that if u and v are 
consecutive neighbors, then {u,v) e R. So we only need to prove the other 
direction: during the for loop over every u Cz U (lines 5-15), the only pairs added 
to R are consecutive neighbors. 

For every i G [1, A], let {Sy.^i, . . . , Sy.,n } be the partition of U by y,j. Let 
ki G [l,?^yJ be such that x G Sy.^kt and let Vi — U\S* j.. . Define T — Sy^^ki H 
• • • n Sy^^kx- Separate T into A — 1 subsets: Ti,. . . ,T'a-i, such that Ti contains 



nodes between {x,yi) and {x,yi-^.i) (See Section A. 4 for the definition of the 
partition by an edge). Then {Vi, . . . , V\, Ti, . . . , Tx_i} is a partition of U as in 
Figure [2][J It is sufficient to show that, during the For loop over every u € Vi 
(1 < i < A) and over every u G Tj (1 < i < A), the only pairs added to R are 
consecutive neighbors. 

Case 1: u is in some Vi (l < i < X). Then yi is the only node in Y with 
S{u,yi) = du- For every yj G Y with 5{yj,u) — du + 1, there must be an edge 
{yi,yj) in the graph, thus {yi,yj) are consecutive neighbors. 

Case 2: u is in some T, (1 < i < A). If 5{u,yi) = 5(m, j/j+i) — d^, we have 
S{u,yj) > du for every yj G F\{yi, y^+i}, so the pair (yi,yi+i) of consecu- 
tive neighbors is the only pair added to R. Otherwise S{u,yi) ^ S{u,yi^i) and 
iniii{6{x,yi),S{u,yi+i)} = d^. Assume S{u,yi) = d^ without loss of generality. 
For every yj G Y such that d{u, yj) = du + 1, either yj = i/i+i or there is an edge 
{yi,yj) in the graph. In both cases, yi and yj are consecutive neighbors. D 



A. 4 Partition by Edge 

Let U he a self-contained subset of V. Let x and y be nodes in U such that {x, y) 
is an edge in G, thus also an edge in G[U]. Let {Sx,i, ■ ■ ■ , Sx,n^} be the partition 
of U by the node x and {Sy^i, . . . , Sy^ny} be the partition of U by the node y, 
which can be computed by the algorithm in Section [Al2 using 0{A ■ \U\) queries. 



For every S* ^ {1 < i < n^) not containing y, remove from U all its elements; 
and for every 5* ^ (1 < i < ny) not containing x, remove from U all its elements. 
Now X and y are non-cut vertices in G[U]. 

Consider any boundary cycle of £*[£/]. Both x and y appear exactly once 
in this cycle, so they separate it into two segments. Define A* and B* to be 
the sets of nodes in the two segments respectively (excluding x and y). Let 
A — A* U {x,y} and B = B* U {x,y}. It is easy to see that every node in 
V\{x,y} belongs to either A* or B*; and that for every a G A* and b G B*, 
there is no edge between a and b, since otherwise G[U] contains a subgraph 
homeomorphic from K4, which contradicts the fact that G[U] is outerplanar. 
The goal of this section is to compute the partition {A, B}. 

Let {zi, . . . , zx) he the neighbors oi x in U in order and let (ii, . . . , t^) be the 
neighbors oi y in U in order. These orders can be computed by the algorithm in 



Section A. 3 using 0{A ■ \U\) queries. In any boundary cycle of G[U], (zi, . . . , zx) 
is in the same order with either (ii,...,i^) or (i^,...,ii). It is not hard to 
distinguish these two cases using 0{A ■ \U\) queries. In the following, we assume 



^ The set Vb in Figure 2 does not exists here since a; is a non-cut node in U 
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the first case holds without loss of generality. Let i e [1, A] and j G [1, /i] be such 
that y = Zi and x ~ tj. See Figure [5] 




.--t 



i-i 



Zi-l 




Fig. 5. Partition by the edge {x,y) 



For any node u G t/\{a;,y}, the following algorithm computes whether u is 
to the left or to the right with respect to the edge {x, y) using a constant number 
of queries. Thus we obtain the partition {A, i?} using 0{A ■ \U\) queries. 

LEFT-OR-RiGHT(a;, y, u) 

1 Query(x, u) 

2 QUERY(y,M) 

3 if 5{x, u) < 6{y,u) 

4 then for fc <^ 1 to A 

5 do QUERY(2:fc,M) 

6 Let i* be such that 6{zi* 

7 ifi*<i 

8 then return Right 

9 else return Left 

10 else for fc <— 1 to /i 

11 do QUERY(ifc,u) 

12 Let j* be such that S{tj, 

13 if f < J 

14 then return Left 

15 else return Right 



u) = mmi<k<x{5 (zk.u)} 



u) 



mini<fc<^{(5(tA;,w)} 



We will show that the algorithm Left-OR-Right(x, y, u) returns the correct 
side of u with respect to {x,y). First consider the case that 5{x,u) < 6{y,u). 
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Let i* E [I, A] be such that Zi* is the closest to u among all neighbors of x. The 
node Zi* is different from y since we assume that d{x,u) < d{y^u). If i* < i, 
then Zi* is to the right of {x, y), so is u; if i* > i, then z^. is to the left of {x, y), 
so is u. Thus the algorithm above returns the correct side of u. The case that 
S{x,u) > S(y,u) is similar. 

A. 5 Finding Polygon 

Let C/ be a self-contained subgraph of V. Let x, yi, yi+i be nodes in U such that x 
is a non-cut node in G[U], yi and yi+i are neighbors of x and they are consecutive 



(see Section A. 2 for the definition of consecutive neighbors). There exists some 
path in G[U] between yi and yi+i without x, since x is a non-cut node in G'[t/]. 
Let P be such a path of minimum length. Then L is unique, since otherwise G[U] 
contains a subgraph homeomorphic from K2.3, which contradicts with the fact 
that G[U] is outerplanar. The path P and the edges {x,yi), {x,yi+i) together 
form a polygon, which is the unique polygon using (x, yi) and (a;, y^+i) as edges. 
The following algorithm computes this polygon using 0{A ■ \U\ log \U\) queries. 

FiND-POLYGON(a;, y„ y^+i, U) 

1 Ai ■(— the subset of U separated by the edge {x, yi) which contains y^+i 

2 A2 <— the subset of U separated by the edge {x, yi+i) which contains yi 

3 A^ AinA2 

4 Query(A, y,) 

5 QUERY(A,y,+i) 

6 d ^min^fz A{S{u,yt)+6{u,yi+i)} 

7 Let z € Ahe such that S{z, yi) + S{z, yi+i) = d and 5{z, yi) = \d/2\ 

8 Pi ^ Shortest-Path(j/j, z) 

9 P2 ^ Shortest-Path(z, yi+i) 

10 return the concatenation of Pi,P2,{yi+i,x),{x,yi) 

The set A above is the set of nodes between the edges {x,yi) and {x,yi+i). 
Let d be the length of the shortest path P between yi and j/^+i that does not 
go by X. Let z is the node in the middle of P. We then calculate the shortest 
path Pi between yi and z and the shortest path P2 between z and j/^+i using 



0(|[/| log |f7|) queries (see Section A.ll. The concatenation of Pi and P2 is P. 



Together with the edges (j/i+i,a;) and {x,yi), we get the polygon. 

A. 6 Partition by Polygon 

Let [/ be a self-contained subset of V. Let (qi, . . . , qi) be an arbitrary polygon 
where every qi is in U. For every i € [1, 1], let {Sq-j}i<j<n be the partition of U 
by the node qi. Exactly one of these Ug. subsets contains all nodes of the polygon. 
Let it be Sq.^mi for some rrii e [l,?T-qJ and define Wi = U\Sq.^rni (see Figure [Sl. 
Let R — C\i<:i<:i Sq.^rrii- Since R is self-contained, R can be partitioned into / 
subsets Ri, . . . ,Ri by the I edges of the polygon. The goal of this section is to 
obtain the partition: {Wi, . . . ,Wi,Ri, . . . , Ri}. Section A. 2 and Section A. 4 give 
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algorithms to compute any Wi and any Ri using 0{A ■ \U\) queries. So a naive 
algorithm to compute the partition has query complexity 0{l ■ A ■ \U\), which 
is 0{A ■ |f7p) when I = 0{\U\). Next we will give an improved implementation 
that uses 0{A ■ \U\ log \U\) queries based on dichotomy. 

First we compute the eight subsets Wi, Ri, M^lV2J ' ^L'/2J ' ^L'/2j+i' ^LV2j+i' 
Wi, Ri. For every node v G U outside the eight subsets, it belongs to one of the 

two subsets Zi = \J2<i<[i/2i-ii^i U Ri) and Z2 = [j[i/2i+2<i<i-i(.^i U i?i)- I* 
is easy to see that both subsets are self-contained. In addition, we can decide if 
V belongs to Zi or Z2 by making two queries: QuERY(w,g2) and Q\jeky{v, qi). 
If 6(v,q2) < S{v,qi), then v is in Zi, otherwise v is in Z2. Thus we obtain the 
sets Zi and Z2 using 0(|t/|) queries. 

The following algorithm PARTiTiON-SEGMENT(s,i, Z) receives two integers 
s,t G [1, 1] and a self-contained subset Z C U, such that Z = [Js<i<ti^i ^ Ri)' 
and returns a partition {Wg, Rs, ■ ■ ■ ,Wt,Rt} oi Z. 

Partition-Segment(s, i, Z) 

1 if s > t 

2 then return 

3 m^ l{s + t)/2\ 

4 Compute Wm,i?m 

5 A^ {Z\{Wm U Rm)) U {qm, qm+l} 

6 QUERY(A,g™) 

7 QUERY(A,g,„+i) 

8 Ai ^ {ue A\qm < qrn+l} 

9 A2^ {ue A\qjn> qm+i} 

10 Qi -s- Partition-Segment(s, m - 1, ^1) 

11 Q2 -S- PARTITION-SEGMENT(m -|- 1, t, A2) 

12 return giUQ2U{MK„,i?„} 

The number of queries to compute Wm and Rm is 0{A ■ \Z\). During the 
recursion, every time (t — s) is reduced to a half, so there are at most log I < 
\og\U\ levels of recursion. At every level, the query complexity is 0{A ■ \U\), 
since the sets Z in the same level are all disjoint. So the total query complexity 
of this algorithm is 0(Z\ • |C/| log |C/|). 

Let Qi (resp. Q2) be the partition of Zi (resp. Z2) returned by the algorithm 
above. Then Qi U Q2 together with the eight sets computed at the beginning 
give the partition {Wi,Ri, . . . ,Wi,Ri}. 



B Missing proofs for Balanced-Partition Algorithm 
B.l Proof of Lemma I 



We only need to show that every time when the algorithm fails to provide a 
^-balanced partition for a sampling (by executing go to Step 1) in Step 3, 4, 7 
or 8, there must be /3-bad set. 
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In Step 3, this happens when x has exactly one neighbor in D. Thus D\{x} 
is a /3-bad set. In Step 4, this happens when \Vi\ > /3\U\ for some i e [1, A]. Such 
Vi is a /3-bad set since x ^ Vi for every z e [1, A]. In Step 7, this happens when 
\Wi\ > P\U\ for some i e [1,1]- Such VF^ cannot be Wi, because Wi is the same 
as Vb, which has size \U\-\D\ + 1 < |C/| -/3|C/| + 1 < /3|C/|, since /3 e (0.7,1) and 
|?7| > 10. Notice that x ^ Wi for every i e [2, 1], so any W^^ with \Wi\ > I3\U\ is a. 
/3-bad set. In Step 8, this happens when \Ri\ > f3\U\ for some i g [1,^]. Such Ri 
cannot be i?i or i?;. In fact, |i?i| + |i?,| < \U\-\Tj\+4< |C/| ~ /3|t/| +4 < /3|I7| 
for /3 G (0.7, 1) and \U\ > 10. Since x ^ Ri for every i e [2,1-1], any Ri with 
|/?i| > /3\U\ is a /3-bad set. 

B.2 Proof of Lemma llll 

We first prove the following lemma: 

Lemma 19. In any tree of bounded degree A which is not a singleton, there is 
an edge that separates the tree into two parts, such that both parts contain at 
least 23 fraction of nodes. 

Proof. Let T be a degree bounded tree of size n (for any n > 2) . For any edge 
(m, v) in T, the removal of this edge would separate T into two subtrees. Let 
Au,v (resp. Bu,v) be the set of nodes in the subtree containing u (resp. v). Let 
{u* , V*) be the edge which maximizes min(|A„^i,|, |i?„_„|). We will show that both 
|^ii*,i;*| E^nd |-B„*^i,*| are at least ^ • n. 

Without loss of generality, we assume that |A„*.i,* | > [Bu* ,v* \- Let wi, . . . ,w\ 
be neighbors of w* which are different from v* (where A < A—1). Then \Buj.^u* \ > 
\Bu*.v* I for every i g [1, A], since i?u».u» is a strict subset of every B^.^u'- On the 
other hand, m[Ti{\Aw^,u*\ABw,,u*\) < min(|A„.^i,.|, |i?„.^t,.|) = \Bu*,v'\, where 
the inequality is from the definition of {u*,v*). So |A„,.k.| < l-Bu* i,.| for every 
i e [1,A]. Since Au',v- = {u} W ^tui,«* W • • • ttl A^^^u', we have: |A„.,„.| = 

l + (|^t«i,n*|H h\A^^^u'\), which is at most l + (zi-l)|B„.^„.|. Since |A„.,„. |-|- 

|-B„._„*| = n, we then have |i?u«.i,*| > ^^^^ > ^ for any n>2. D 

Let G — {V, E) be any outerplanar graph of bounded degree with n vertices. 
When G is a singleton, the result is trivial. So we assume that n > 2. When G 
is a tree, take z to be one of the endpoints of the edge satisfying the condition 
We have Pz > ao for ao = 53 (l ~ 2z) ■ 
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of Lemma 

Next we consider the case when G is not a tree. Take [qi, . . . ,qi) to be a 
cycle in G of minimum length. This cycle must form a polygon, since otherwise 
we can separate it into two smaller cycles. We partition U into 21 self-contained 



subsets: Ri, . . . ,Ri, Wi , . . . ,Wi (as in Section A. 6 1 . At least one of the four cases 
holds: 



' n, m^n] 



1. Every subset is of size smaller than j^n 

2. There exists i £ [1,1] such that |Wi| S [jk^ 

3. There exists j g [1,/] such that [Rj[ g [jm^^j j 

4. Exactly one subset is of size larger than j^n.. 
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In Case 1, since every {Ri U Wi)i<i<i has at most ^n nodes, there exists 
i e [1, 1] such that the two subsets A — Ui<j<j ^j U Wj and B = Ui< ,<; ^j U Wj 
both have more than (^ — ^)n nodes. For every a E A and every b E B, the 
shortest path between a and b must go by either qi or qi . Thus maxjpqj , p^^ } > 
ai, where ai = (| — ^)^. 

In Case 2. let i E [1,1] be such that \Wi\ E [njo^-j nl)'^]- For every pair 
(a, 6) € Wi X (C/\W^i), the node qi is in every shortest path between a and b. So 
Pq; > "2, where a2 = 2 • j^ • ^. 

In Case 3, let j E [1,1] be such that [Rj[ E [jJjqIT't j^g^]- For every pair 
(a, b) E Rj X {U\Rj), every shortest path between a and b goes by at least one 
of qj and q^+i, thus maxjpg^ ,p,^.^ J > ag, where "3 = njo ' ^• 

In Case 4. let Tq C U he the set of size larger than ^^n. We already know 



that To is self-contained. If G'[To] is a tree, by Lemma 19 there exists z E Tq 



such that z is in at least ckQ fraction of the shortest paths where both endpoints 
are in Tq. Since jTol > yoq''^' -^ ^^ ^^^ ^^ least (j^) ■ ao fraction of the shortest 
paths where both endpoints are in U. If G[Tq] is a tree, it contains a polygon. If 
G[Tq] is in Case 1 (resp. Case 2, Case 3), similarly, there exists z E Tq such that 

Pz is at least ( jUj) • ai (resp. (j^) ■ a2, ( iHj) ' Q^s)- We only need to treat the 
Case 4 of G[To]. Let Ti C Tq be the set of size larger than -^n. We apply the 
same argument on G[ri]. If G[Ti] is not in Case 4, we are done; otherwise we 
obtain T2 with [T2[ > j§Qn, etc. Every T^+i is a strict subset of Ti (for « > 0). So 
this procedure stops after a finite number of iterations and finds a node z with 
Pz>a where a = (^) • min{ao, ai,a2, 03}. 



B.3 Proof of Lemma 



Let z be a node in U with Pz > a (such node always exists by Lemma 111. 
We will show that P [py > pz] < tJjjt for any node y with py < a/2. This is 

sufficient since we then have P [3y E U, s.t. py < a/2 and py > pz] < |. Thus 
with probability at least |, any node x with the largest p^ satisfies p^ > a/2. 

Let y be any node with py < a/2. For every i E [l,'^^], define the variable 
Yi E {0, 1}, such that Fj = 1 if the node y is in some shortest path between 
Oj and bi, and li = otherwise. Since {ai}i<ci<uj and {bi}i<i<i^ are uniform 
and independent random nodes in U, {l^i}i<i<w are independent and identically 
distributed random variables, and each Yi equals 1 with probability py. We then 
have E[Yi] = Py < a/2. 

Similarly, define Zi E {0, 1} such that Zi = 1 li the node z is in some 
shortest path between ai and bi, and Zi — Q otherwise. Then {^i}i<i<Lj are 
independent and identically distributed random variables and each Zi equals 1 
with probability pz. Thus V\Zi] — Pz > ot. 

For every i E [1, w], define T, = Y,-Z,. Then E[T,] = E[y,]-E[Z,] < -f . Let 
T = j{Ti + • • • Ti^). Since {T'i}i<i<tj are independent and identically distributed 
random variables, E[r] = E[ri]. We have: 
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P [Py > Pz] 



J2 T,>0 



< 



a 



v2, 



|r-E[r]|> J_|<2.exp(-°^) 



where the last step holds by HoefFding's inequality. Take the constant C to 
be large enough such that 2 • exp(— ^y^) < ^J^jt as soon as \U\ > 1, where 
oj = C ■ log \U\. Then we have P [py > %] < grW. 



C Proof of Proposition 16 



We will define a certain distribution of graphs and show that on that random 
input, any deterministic algorithm for the /-approximate metric reconstruction 
problem requires f2(ji^/f) queries on average, for n large enough. By Yao's 
Minimax Principle [SSj, the result follows. 

To simplify the proof, let n = 2fk + 1, for fc e N. We take the uniform 
distribution on the following set of trees, one for each /-tuple (cri, . . . , cry), where 
every tXi is a permutation of Sk- The tree T has one vertex ao as the root (on 
the first level), k vertices ai, . . . , a^ on the second level, k vertices ak+i, ■ ■ ■ , a2k 
on the third level, • • • , and k vertices Qn-k, ■ ■ ■ , o-n-i on the (2/ + 1)*'' level. For 
every I ^ [2, /] and every i € [1, A:], there is an edge between the i**^ node on level 
I and the i*'^ node on level l + l. For every Z S [/ + !, 2/] and every i e [1, fc], there 
is an edge between the i^^ node on level I and the cri^f{iY^ node on level I + 1. 
Every tree constructed above has k branches from the root, and every branch 
is a path of 2/ nodes. We will show that any deterministic algorithm requires 
/2(n^//) queries on average to compute an /-approximation of the metric of T, 
for n large enough. 

Let A he a, deterministic algorithm for an /-approximation of the metric. 
Based on A, we can reconstruct the tree exactly as follows: 

1. Execute the algorithm A to get 5 as the /-approximation of the metric; 

2. For every u and v on consecutive levels below level I + 1, there is an edge 
between u and v iff. 5{u,v) < 2/. 

In fact, for every two nodes u and v on consecutive levels below level / + 1, if 
they are in the same branch with respect to the root, we have 6{u, v) = 1, thus 
S{u,v) < /; and if they are in different branches, we have S{u,v) — S{aQ,u) + 
S(,o,o,v) > 2/, thus S{u,v) > 2/. So we indeed reconstruct the tree T exactly 
based on A. In order to prove that A requires Q{ii? / f) queries on average, we 
only need to prove that any deterministic algorithm for the exact reconstruction 
problem requires fi{in? / f) queries on average. 

Let A' be a deterministic algorithm that reconstructs exactly the tree T. We 
assume that A' does not make redundant queries, whose answers can be deduced 
before the query. Obviously, any query with the root is redundant. For any two 
node u and v, let lu and ly be their levels. The query (u, v) is redundant when 
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^u < / + 1 and ly < / + 1; since the first / + 1 levels in T are fixed. Thus every 
query (u, v) is such that ^jj > / + 1 and ly > 2 (we suppose that Z„ > ly without 
loss of generality) . The answer is either l^ — ly, if u and v are in the same branch; 
OT lu + ly — 2, if u and v are in different branches. We can equivalently identify 
the answer as Yes or No to the question: Are u and v in the same branch? 

Next, we will bound the number of Yes answers received by A' . To do this, 
we introduce the component graph H, which represents the information from all 
Yes answers received by A'. The vertex set of H is defined to be the set of all 
nodes in T on level at least / + 1. At the beginning, the edge set of H is empty. 
Every time when a query (w, v) gives a Yes answer, we add an edge to H: 

— if Zu > / + 1 and ly > / + 1, add the edge (u, v) to H; 

— ii ly > f + 1 and 2 < /^ < / + 1 , let w be the only node on level / + 1 which 
is in the same branch of v; add the edge (u, w) to H; 

There could not be cycles in H, since otherwise there are redundant queries. 
The number of connected components in H is at least k, since every connected 
component in H contains nodes from the same branch of T and there are k 
branches in T. The number of edges in H is the number of vertices in H minus 
the number of connected components in H, so it is at most k{f + 1) — k = kf. 
There is a one-to-one correspondence between the Yes answers and the edges in 
H, so A' stops after at most kf Yes answers. 

We use a decision tree argument. Consider the decision tree of A' (see Fig- 
ure l6l. A' first queries some pair {ui,vi). If the answer is Yes (left subtree 
in Figure pi, it queries some pair (1(2,^2), otherwise (right subtree in Fig- 
ure p| it queries some pair {u^jV^), etc. Since A' is deterministic, the sequence 
{{ui,Vi)}i>i is fixed in advance. 



QUERY(ui,!;i) 
Yes I J No 



QUERY(M2,D2) 



QUERY(u3,t)3) 



QUERY(U4,1'4) 



QUERY(u5,t.r,) 



QUERY (wi.Wi) 



QUERY(M7,1'7) 



Fig. 6. Decision tree of A' 



A' stops when it is able to reconstruct the tree T, which corresponds to a 
leaf in the decision tree. There are in total (fc!)-'' leaves in the decision tree. 
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corresponding to all /-tuples of permutations of 5*^. For every leaf, there are 
j < kf Yes answers along the path from the root to this leaf. 

Let h > 4:kf to be defined later. The expected number of queries is 

^r ,, 1 , „ r ,, , 1 , / #( leaves at level < h) 
i; #queries > /iPr #queries > h] = h { 1 - ^ --— '- 

V iW 

A leaf of level < h is identified by its root-leaf path, a word over { Yes, No} 
of length < h and with j < kf Yes's. Thus, using kf < (l/2)(/i/2): 

#( leaves at level < h) < E (J) ^ 2 • Q^^ < |^. 
Plugging that in, using Stirling's formula, and setting h ~ k'^fje^, we get 

Since k ■ f ^ n/2 and e^ < 10, for n large enough the rightmost expression is at 
least nV(40/) ^ n{n^/f). 
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